HAT-Trie: A Cache-Conscious Trie-Based Data Structure For Strings

نویسندگان

  • Nikolas Askitis
  • Ranjan Sinha
چکیده

Tries are the fastest tree-based data structures for managing strings in-memory, but are space-intensive. The burst-trie is almost as fast but reduces space by collapsing trie-chains into buckets. This is not however, a cache-conscious approach and can lead to poor performance on current processors. In this paper, we introduce the HAT-trie, a cache-conscious trie-based data structure that is formed by carefully combining existing components. We evaluate performance using several real-world datasets and against other highperformance data structures. We show strong improvements in both time and space; in most cases approaching that of the cache-conscious hash table. Our HAT-trie is shown to be the most efficient trie-based data structure for managing variable-length strings in-memory while maintaining sort order.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Trie-Join: Efficient Trie-based String Similarity Joins with Edit-Distance Constraints

A string similarity join finds similar pairs between two collections of strings. It is an essential operation in many applications, such as data integration and cleaning, and has attracted significant attention recently. In this paper, we study string similarity joins with edit-distance constraints. Existing methods usually employ a filter-and-refine framework and have the following disadvantag...

متن کامل

P-Trie Tree: A Novel Tree Structure for Storing Polysemantic Data

Trie tree, is an ordered tree data structure that is used to store a dynamic set or associative array where the keys are usually strings. It makes the search and update of words more efficient and is widely used in the construction of English dictionary for the storage of English vocabulary. Within the application of big data, efficiency determines the availability and usability of a system. In...

متن کامل

Construction of the CDAWG for a Trie

Trie is a tree structure to represent a set of strings. When the strings have many common prefixes, the number of nodes in the trie is much less than the total length of the strings. In this paper, we propose an algorithm for constructing the Compact Directed Acyclic Word Graph for a trie, which runs in linear time and space with respect to the number of nodes in the trie.

متن کامل

Efficient Trie-Based Sorting of Large Sets of Strings

Sorting is a fundamental algorithmic task. Many generalpurpose sorting algorithms have been developed, but efficiency gains can be achieved by designing algorithms for specific kinds of data, such as strings. In previous work we have shown that our burstsort, a trie-based algorithm for sorting strings, is for large data sets more efficient than all previous algorithms for this task. In this pap...

متن کامل

Algorithms and Data Structures for Ip Lookups

Ioannidis, Ioannis. Ph.D., Purdue University, May, 2005. Algorithms and DataStructures for IP Lookups. Major Professor: Ananth Grama. The problem of optimizing access mechanisms for IP routing tables is an impor-tant and well studied one. Several techniques have been proposed for structuringand managing routing tables, with special emphasis on backbone, high-throughputrouting. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007